Fix Transformer2DModel ada_norm #7578

will-rice · 2024-04-04T21:50:59Z

What does this PR do?

This is a fix for an issue I opened earlier today. I'll link it down below.

Fixes # (issue)
#7575

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

sayakpaul · 2024-04-05T05:07:23Z

src/diffusers/models/transformers/transformer_2d.py

+            else:
+                conditioning = self.transformer_blocks[0].norm1.emb(
+                    timestep, class_labels, hidden_dtype=hidden_states.dtype
+                )
+                shift, scale = self.proj_out_1(F.silu(conditioning)).chunk(2, dim=1)
+                hidden_states = self.norm_out(hidden_states) * (1 + scale[:, None]) + shift[:, None]
+                hidden_states = self.proj_out_2(hidden_states)


Isn't this practically the same as what we're doing when norm_type == "ada_norm"?

It is similar, but ada_norm doesn't take class labels as an argument. I moved this to else because the original was if norm_type != ada_norm_single. From what I can tell that block still only supports the norm used in the original DiT implementation. It might be worth it to refactor and allow other norm types.

We can then still condition on if class_labels is not None or something like that no?

If class_labels is None then you can't use AdaLayerNormZero. I'm not sure what the default norm should be when you want to condition on text or audio, but I picked AdaLayerNorm because it was similar to the zero variant without needing class labels.

How about ada norm single i.e., the one used in PixArt Alpha?

If you use that one without additional arguments you get this error:

TypeError: PixArtAlphaCombinedTimestepSizeEmbeddings( (time_proj): Timesteps() (timestep_embedder): TimestepEmbedding( (linear_1): Linear(in_features=256, out_features=1408, bias=True) (act): SiLU() (linear_2): Linear(in_features=1408, out_features=1408, bias=True) ) ) argument after ** must be a mapping, not NoneType

It requires the arugments resolution and aspect_ratio, but those could have a default of None because they aren't required when use_additional_conditions=False

Do you think changing that block would be more appropriate?

@yiyixuxu WDYT?

sayakpaul · 2024-04-05T05:08:09Z

src/diffusers/models/normalization.py

+        scale, shift = torch.chunk(emb, 2, dim=1)
+        x = self.norm(x) * (1 + scale[:, None, :]) + shift[:, None, :]


Can you please explain why this set of changes?

The default argument for dim in torch.chunk is dim=0 so it was splitting on the batch dimension. The second line change is to broadcast the scale and shift to the correct shape. This is exactly what is done elsewhere in the implementation that scale and shift are used. I could change it to scale[:, None] and shift[:, None] to match the other places it is used.

Actually that caused some of the tests to fail. I will take a look at that.

This is the test that fails. Is it intended for AdaLayerNorm to not support batch size? If you add a batch dimension to this test it passes with my changes.

diffusers/tests/models/test_layers_utils.py

Line 381 in 1c60e09

timestep_1 = torch.tensor(1, dtype=torch.long).to(torch_device)

Probably because of how the integration was done. During that process, the no.1 priority is to get the model integrated in the library. So, exhaustivity isn't at all prioritized.

sayakpaul

Thanks for the PR. Left some comments.

HuggingFaceDocBuilderDev · 2024-04-05T05:16:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

will-rice · 2024-04-05T12:13:41Z

Something I want to add is that I'm not partial to ada_norm here. I just want to use DiT without needing class labels or the additional arguments for PixArtAlphaCombinedTimestepSizeEmbeddings.

github-actions · 2024-05-05T15:02:37Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul · 2024-06-29T13:28:03Z

Gently pinging @yiyixuxu

github-actions · 2024-10-09T15:07:19Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

pranayj77 · 2025-01-13T16:57:58Z

Was this ever fixed?

pranayj77 · 2025-01-13T16:58:13Z

@will-rice @sayakpaul

will-rice · 2025-01-13T17:00:49Z

I'm not sure if this specifically was fixed, but there are several DiT models to choose from now.

pranayj77 · 2025-01-13T17:07:14Z

Got it thanks, will use the DiTTransformer2DModel or something else

will-rice · 2025-01-13T17:09:29Z

That's what I ended up doing.

wufeim · 2025-09-09T19:21:41Z

Doesn't DiTTransformer2DModel still ask for class_labels as ada_norm is not properly implemented or tested?

# Validate inputs.
if norm_type != "ada_norm_zero":
    raise NotImplementedError(
        f"Forward pass is not implemented when `patch_size` is not None and `norm_type` is '{norm_type}'."
    )
elif norm_type == "ada_norm_zero" and num_embeds_ada_norm is None:
    raise ValueError(
        f"When using a `patch_size` and this `norm_type` ({norm_type}), `num_embeds_ada_norm` cannot be None."
    )

will-rice added 2 commits April 4, 2024 17:46

Fix Transformer2DModel ada_norm

d07a24c

make style

78c3d9f

sayakpaul reviewed Apr 5, 2024

View reviewed changes

Merge branch 'main' into wr-patch-transformer-fix

9d382ea

sayakpaul requested a review from yiyixuxu April 5, 2024 05:08

sayakpaul mentioned this pull request Apr 5, 2024

Shape Error with Transformer2DModel and "adanorm" #7575

Closed

fix spatial transformer test

1c1d00e

github-actions bot added the stale Issues that haven't received updates label May 5, 2024

github-actions bot removed the stale Issues that haven't received updates label Sep 14, 2024

github-actions bot added the stale Issues that haven't received updates label Oct 9, 2024

will-rice closed this Oct 16, 2024

		scale, shift = torch.chunk(emb, 2, dim=1)
		x = self.norm(x) * (1 + scale[:, None, :]) + shift[:, None, :]

Fix Transformer2DModel ada_norm #7578

Fix Transformer2DModel ada_norm #7578

Uh oh!

Conversation

will-rice commented Apr 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

will-rice Apr 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 5, 2024

Uh oh!

will-rice commented Apr 5, 2024

Uh oh!

github-actions bot commented May 5, 2024

Uh oh!

sayakpaul commented Jun 29, 2024

Uh oh!

github-actions bot commented Oct 9, 2024

Uh oh!

pranayj77 commented Jan 13, 2025

Uh oh!

pranayj77 commented Jan 13, 2025

Uh oh!

will-rice commented Jan 13, 2025

Uh oh!

pranayj77 commented Jan 13, 2025

Uh oh!

will-rice commented Jan 13, 2025

Uh oh!

wufeim commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

will-rice commented Apr 4, 2024 •

edited

Loading

will-rice Apr 5, 2024 •

edited

Loading